A Template-Based Approach to Summarize XML Collections
نویسندگان
چکیده
Existing summarization approaches for XML concentrate on extracting common structure and compressing the data, to optimize storage and speed up queries. Neither compression, nor structure extraction suffices for advanced, content-based summarization tasks. We present a set of tools for semi-automatic summarization of XML collections, where the user can specify semantically relevant features for an XML collection in a template, and define rules for summarization. The system assists the user in generating one or several such templates, selects applicable templates for a given collection, and applies them for automatic summarization. In experiments on the INEX collection (among others), we investigate the merits and limitations of our approach.
منابع مشابه
ToXgene: An extensible template-based data generator for XML
Synthetic collections of XML documents are useful in many applications, such as benchmarking (e.g., XMach-1, Xmark), and algorithm testing and evaluation. We present ToXgene a template-based generator for large, consistent collections of synthetic XML documents. Templates are annotated XML Schema specifications describing both the structure and the content of the data to be generated. Our tool ...
متن کاملUsing XML for flexible data entry in healthcare example use for pathology
This paper describes a pragmatic, generic and flexible approach for the management of XML structured data at the example of pathology reports. The flexibility of this approach is based on a template concept. The template describes the documents of a given (clinical) domain in terms of structure and user interface requirements. The template enables a so called document manager to provide a corre...
متن کاملCode Generation via Xml/xslt Vs Cc-based Approaches Steps to Generate Source Code Compiler-compiler Based Xml / Xslt Based Tree Walking Process Input Language Syntax Trees Output Language Yntax Tree Sy S Figure 1 Code Generation Processing Phases
www.XML-JOURNAL.com august 2002 T ypically, the purpose of the software subsystem is to generate a concrete implementation from declarative models. This could be viewed as an extension of MVC (Model-View-Controller) architecture by incorporating a generator component (i.e., MVCG). Adopting a generative approach in software development is a goal cherished by many application developers. Why writ...
متن کاملAn XML-based Approach for the Presentation and Exploitation of Extracted Information
We present an approach for exploiting knowledge from documents in the web. It is based on the integration of XML technologies with robust tools for natural language processing. The overall goal is to offer a knowledge engineer as much support as possible for the task of extracting and formalizing knowledge from document collections.
متن کاملA Clustered Index Approach to Distributed XPath
Supporting top-k queries over distributed collections of schemaless XML data poses two challenges. While XML supports expressive query languages such as XPath and XQuery, these languages require schema knowledge so as to write an appropriate query which may not be available in distributed systems with autonomous and dynamic sources. Thus, there is a need for approximate query processing. Furthe...
متن کامل